2,327 research outputs found
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Existing visual reasoning datasets such as Visual Question Answering (VQA),
often suffer from biases conditioned on the question, image or answer
distributions. The recently proposed CLEVR dataset addresses these limitations
and requires fine-grained reasoning but the dataset is synthetic and consists
of similar objects and sentence structures across the dataset.
In this paper, we introduce a new inference task, Visual Entailment (VE) -
consisting of image-sentence pairs whereby a premise is defined by an image,
rather than a natural language sentence as in traditional Textual Entailment
tasks. The goal of a trained VE model is to predict whether the image
semantically entails the text. To realize this task, we build a dataset SNLI-VE
based on the Stanford Natural Language Inference corpus and Flickr30k dataset.
We evaluate various existing VQA baselines and build a model called Explainable
Visual Entailment (EVE) system to address the VE task. EVE achieves up to 71%
accuracy and outperforms several other state-of-the-art VQA based models.
Finally, we demonstrate the explainability of EVE through cross-modal attention
visualizations. The SNLI-VE dataset is publicly available at
https://github.com/ necla-ml/SNLI-VE
Visual Entailment Task for Visually-Grounded Language Learning
We introduce a new inference task - Visual Entailment (VE) - which differs
from traditional Textual Entailment (TE) tasks whereby a premise is defined by
an image, rather than a natural language sentence as in TE tasks. A novel
dataset SNLI-VE (publicly available at https://github.com/necla-ml/SNLI-VE) is
proposed for VE tasks based on the Stanford Natural Language Inference corpus
and Flickr30k. We introduce a differentiable architecture called the
Explainable Visual Entailment model (EVE) to tackle the VE problem. EVE and
several other state-of-the-art visual question answering (VQA) based models are
evaluated on the SNLI-VE dataset, facilitating grounded language understanding
and providing insights on how modern VQA based models perform.Comment: 4 pages, accepted by Visually Grounded Interaction and Language
(ViGIL) workshop in NeurIPS 201
Recommended from our members
MicroMAIS: executing and orchestrating Web services on constrained mobile devices
Mobile devices with their more and more powerful resources allow the development of mobile information systems in which services are not only provided by traditional systems but also autonomously executed and controlled in the mobile devices themselves. Services distributed on autonomous mobile devices allow both the development of cooperative applications without a back-end infrastructure and the development of applications blending distributed and centralized services. In this paper, we propose MicroMAIS: an integrated platform for supporting the execution of Web service-based applications natively on a mobile device. The MicroMAIS platform is composed of mAS and μ-BPEL. The former allows the execution of a single Web service, whereas the latter permits the orchestration of several Web services according to the WS-BPEL standard
COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality
Group Activity Recognition detects the activity collectively performed by a
group of actors, which requires compositional reasoning of actors and objects.
We approach the task by modeling the video as tokens that represent the
multi-scale semantic concepts in the video. We propose COMPOSER, a Multiscale
Transformer based architecture that performs attention-based reasoning over
tokens at each scale and learns group activity compositionally. In addition,
prior works suffer from scene biases with privacy and ethical concerns. We only
use the keypoint modality which reduces scene biases and prevents acquiring
detailed visual data that may contain private or biased information of users.
We improve the multiscale representations in COMPOSER by clustering the
intermediate scale representations, while maintaining consistent cluster
assignments between scales. Finally, we use techniques such as auxiliary
prediction and data augmentations tailored to the keypoint signals to aid model
training. We demonstrate the model's strength and interpretability on two
widely-used datasets (Volleyball and Collective Activity). COMPOSER achieves up
to +5.4% improvement with just the keypoint modality. Code is available at
https://github.com/hongluzhou/composerComment: ECCV 202
Identification and phylogenetic comparison of p53 in two distinct mussel species (Mytilus)
Author Posting. © The Authors, 2005. This is the author's version of the work. It is
posted here by permission of Elsevier B. V. for personal use, not for redistribution. The
definitive version was published in Comparative Biochemistry and Physiology Part C: Toxicology & Pharmacology 140 (2005): 237-250, doi:10.1016/j.cca.2005.02.011.The extent to which humans and wildlife are exposed to anthropogenic challenges is an important focus of environmental research. Potential use of p53 gene family marker(s) for aquatic environmental effects monitoring is the long-term goal of this research. The p53 gene is a tumor suppressor gene that is fundamental in cell cycle control and apoptosis. It is mutated or differentially expressed in about 50% of all human cancers and p53 family members are differentially expressed in leukemic clams. Here, we report the identification and characterization of the p53 gene in two species of Mytilus, Mytilus edulis and Mytilus trossulus, using RT-PCR with degenerate and specific primers to conserved regions of the gene. The Mytilus p53 proteins are 99.8% identical and closely related to clam (Mya) p53. In particular, the 3′ untranslated regions were examined to gain understanding of potential post-transcriptional regulatory pathways of p53 expression. We found nuclear and cytoplasmic polyadenylation elements, adenylate/uridylate-rich elements, and a K-box motif previously identified in other, unrelated genes. We also identified a new motif in the p53 3′UTR which is highly conserved across vertebrate and invertebrate species. Differences between the p53 genes of the two Mytilus species may be part of genetic determinants underlying variation in leukemia prevalence and/or development, but this requires further investigation. In conclusion, the conserved regions in these p53 paralogues may represent potential control points in gene expression. This information provides a critical first step in the evaluation of p53 expression as a potential marker for environmental assessment.AFM was supported by the Greater Vancouver Regional District, BC, Canada, and RLC was supported by STAR grant R82935901 from the Environmental Protection Agency (USA)
Phosphoproteomics reveals that Parkinson’s disease kinase LRRK2 regulates a subset of Rab GTPases
Mutations in Park8, encoding for the multidomain Leucine-rich repeat kinase 2 (LRRK2) protein, comprise the predominant genetic cause of Parkinson's disease (PD). G2019S, the most common amino acid substitution activates the kinase two- to threefold. This has motivated the development of LRRK2 kinase inhibitors; however, poor consensus on physiological LRRK2 substrates has hampered clinical development of such therapeutics. We employ a combination of phosphoproteomics, genetics, and pharmacology to unambiguously identify a subset of Rab GTPases as key LRRK2 substrates. LRRK2 directly phosphorylates these both in vivo and in vitro on an evolutionary conserved residue in the switch II domain. Pathogenic LRRK2 variants mapping to different functional domains increase phosphorylation of Rabs and this strongly decreases their affinity to regulatory proteins including Rab GDP dissociation inhibitors (GDIs). Our findings uncover a key class of bona-fide LRRK2 substrates and a novel regulatory mechanism of Rabs that connects them to PD
- …